Publishing Set-Valued Data via Differential Privacy
نویسندگان
چکیده
Set-valued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing set-valued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for set-valued data are based on partitionbased privacy models, for example k-anonymity, which are vulnerable to privacy attacks based on background knowledge. In contrast, differential privacy provides strong privacy guarantees independent of an adversary’s background knowledge, computational power or subsequent behavior. Existing data publishing approaches for differential privacy, however, are not adequate in terms of both utility and scalability in the context of set-valued data due to its high dimensionality. We demonstrate that set-valued data could be efficiently released under differential privacy with guaranteed utility with the help of context-free taxonomy trees. We propose a probabilistic top-down partitioning algorithm to generate a differentially private release, which scales linearly with the input data size. We also discuss the applicability of our idea to the context of relational data. We prove that our result is (ǫ, δ)-useful for the class of counting queries, the foundation of many data mining tasks. We show that our approach maintains high utility for counting queries and frequent itemset mining and scales to large datasets through extensive experiments on real-life set-valued datasets.
منابع مشابه
Towards Privacy Preserving Publishing of Set-valued Data on Hybrid Cloud
Storage as a service has become an important paradigm in cloud computing for its great flexibility and economic savings. However, the development is hampered by data privacy concerns: data owners no longer physically possess the storage of their data. In this work, we study the issue of privacy-preserving set-valued data publishing. Existing data privacypreserving techniques (such as encryption...
متن کاملارایه یک روش جدید انتشار دادهها با حفظ محرمانگی با هدف بهبود دقّت طبقهبندی روی دادههای گمنام
Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from thes...
متن کاملPrivacy-preserving heterogeneous health data sharing
OBJECTIVE Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing rel...
متن کاملDifferential Privacy via Weighted Sampling Set Cover
Differential privacy is a security guarantee model which widely used in privacy preserving data publishing, but the query result can’t be used in data research directly, especially in high-dimensional datasets. To address this problem, we propose a dimensionality reduction method. The core idea of this method is using a series of lowdimensional datasets to reconstruct a high-dimensional dataset...
متن کاملM-Partition Privacy Scheme to Anonymizing Set-Valued Data
In distributed databases there is an increasing need for sharing data that contain personal information. The existing system presented collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. M-privacy guarantees that anonymized data satisfies a given privacy constraint against any group of up to m colluding data providers. The heuristic al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 4 شماره
صفحات -
تاریخ انتشار 2011